Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition

نویسندگان

  • Xiangang Li
  • Dan Su
  • Zaihu Pang
  • Xihong Wu
چکیده

In this paper, a probabilistic speaker-class (PSC) based acoustic modeling method is proposed for taking into account speaker variability influence in HMM-based speech recognition systems. Firstly, within the context of speaker-class based speech recognition, an experiment is conducted to investigate the performance of speaker-class recognition based on hard-cut speaker clustering. Then, in the proposed method, through introducing the probabilistic latent speaker analysis, the speaker-class dependent acoustic models are trained based on a soft-decision speaker clustering method, and combined by the distribution of speaker-class in the decoding phase. The experiments were conducted on a 600-hour speech corpus, and showed improvement in a large vocabulary continuous speech recognition task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Session 3: Continuous Speech Recognition

The papers in this session focus on techniques for and applications of large-vocabulary continuous speech recognition. The technique oriented papers discuss techniques for channel compensation, fast search, acoustic modeling, and adaptive language modeling. The applications oriented papers discuss methods for using recognizers for language identification, speaker identification, speakersex iden...

متن کامل

Probabilistic latent speaker training for large vocabulary speech recognition

In this paper, we describe an improvement on probabilistic latent speaker analysis method and investigate the use of probabilistic latent speaker analysis for acoustic model training. By performing co-occurrence analysis between speaker and dominant components, speaker variation is dealt with based on different trajectories. Speech recognition experiment results show that our method, although w...

متن کامل

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent elementwise affine transformations to canonicalize the internal feature representations at the output...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012